Statistical Machine Translation: Rapid Development with Limited Resources
نویسندگان
چکیده
We describe an experiment in rapid development of a statistical machine translation (SMT) system from scratch, using limited resources: under this heading we include not only training data, but also computing power, linguistic knowledge, programming effort, and absolute time.
منابع مشابه
The Temple Translator's Workstation Project
The Temple project has developed an open multi.lingual architecture and software support for rapid development of extensible Machine Translation functionalities. The targeted languages are those for which Natural Language Processing and human resources are scarce or difficult to obtain. The goal is to support rapid development of machine translation functionalities in a very short time with lim...
متن کاملThe Temple Web Translator
New Web sites in foreign languages are appearing everyday, and language barriers threaten to atomize the World Wide Web into closed linguistic communities. The Temple project has developed an open multilingual architecture and software support for rapid development of machine translation systems for assimilation purposes. The targeted languages are those for which natural language processing an...
متن کاملResource Report: Building Parallel Text Corpora for Multi-Domain Translation System
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. However, manual translations are very costly, and the number of known parallel text is limited. Hence, our research started with creating and collecting a large amount of parallel text resources for Indonesian-English. We describe in this paper the creation ...
متن کاملBuilding an English-iraqi Arabic machine translation system for spoken utterances with limited resources
This paper presents an English-Iraqi Arabic speech-to-speech statistical machine translation system using limited resources. In it, we explore the constraints involved, how we endeavored to mitigate such problems as a non-standard orthography and a highly inflected grammar, and discuss leveraging existing plentiful resources for Modern Standard Arabic to assist in this task. These combined tech...
متن کاملCreating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has supported research on statistical machine translations and other NLP applications by creating and distributing a large amount of parallel text resources for the research communities. However, manual translations are v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003